Mastering Machine Learning with Spark 2.x by Tellez Alex & Pumperla Max & Malohlava Michal

Author:Tellez, Alex & Pumperla, Max & Malohlava, Michal [Tellez, Alex] , Date: November 13, 2017 ,Views: 147

Mastering Machine Learning with Spark 2.x by Tellez Alex & Pumperla Max & Malohlava Michal

Author:Tellez, Alex & Pumperla, Max & Malohlava, Michal [Tellez, Alex]
Language: eng
Format: azw3
Publisher: Packt Publishing
Published: 2017-08-31T04:00:00+00:00

Because we have already covered the preceding steps in Chapter 4, Predicting Movie Reviews Using NLP and Spark Streaming, we'll quickly reproduce them in this section.

As usual, we begin with starting the Spark shell, which is our working environment:

export SPARKLING_WATER_VERSION="2.1.12" export SPARK_PACKAGES=\ "ai.h2o:sparkling-water-core_2.11:${SPARKLING_WATER_VERSION},\ ai.h2o:sparkling-water-repl_2.11:${SPARKLING_WATER_VERSION},\ ai.h2o:sparkling-water-ml_2.11:${SPARKLING_WATER_VERSION},\ com.packtpub:mastering-ml-w-spark-utils:1.0.0" $SPARK_HOME/bin/spark-shell \ --master 'local[*]' \ --driver-memory 8g \ --executor-memory 8g \ --conf spark.executor.extraJavaOptions=-XX:MaxPermSize=384M \ --conf spark.driver.extraJavaOptions=-XX:MaxPermSize=384M \ --packages "$SPARK_PACKAGES" "$@"

In the prepared environment, we can directly load the data:

val DATASET_DIR = s"${sys.env.get("DATADIR").getOrElse("data")}/aclImdb/train"

val FILE_SELECTOR = "*.txt" case class Review(label: Int, reviewText: String)

val positiveReviews = spark.read.textFile(s"$DATASET_DIR/pos/$FILE_SELECTOR")

.map(line => Review(1, line)).toDF

val negativeReviews = spark.read.textFile(s"$DATASET_DIR/neg/$FILE_SELECTOR")

.map(line => Review(0, line)).toDF

var movieReviews = positiveReviews.union(negativeReviews)

We can also define the tokenization function to split the reviews into tokens, removing all the common words:

import org.apache.spark.ml.feature.StopWordsRemover

val stopWords = StopWordsRemover.loadDefaultStopWords("english") ++ Array("ax", "arent", "re")

Download

Mastering Machine Learning with Spark 2.x by Tellez Alex & Pumperla Max & Malohlava Michal.azw3

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

other	Arts & Photography
Biographies & Memoirs	Business & Money
Calendars	Christian Books & Bibles
Comics & Graphic Novels	Computers & Technology
Cookbooks, Food & Wine	Crafts, Hobbies & Home
Education & Teaching	Engineering & Transportation
Health, Fitness & Dieting	Humor & Entertainment
Law	Lesbian, Gay, Bisexual & Transgender Books
Literature & Fiction	Medical Books
Mystery, Thriller & Suspense	Parenting & Relationships
Politics & Social Sciences	Reference
Religion & Spirituality	Romance
Science & Math	Science Fiction & Fantasy
Self-Help	Sports & Outdoors
Teen & Young Adult	Test Preparation
Travel	Children's Books
History

Popular ebooks

Deep Learning with Python by François Chollet(25127)
The Mikado Method by Ola Ellnestam Daniel Brolund(22435)
Hello! Python by Anthony Briggs(21624)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(20184)
Dependency Injection in .NET by Mark Seemann(19563)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(19311)
Kotlin in Action by Dmitry Jemerov(19237)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(18775)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(17577)
Adobe Camera Raw For Digital Photographers Only by Rob Sheppard(16967)
Grails in Action by Glen Smith Peter Ledbrook(16730)
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(14220)
Secrets of the JavaScript Ninja by John Resig & Bear Bibeault(12199)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(10923)
A Developer's Guide to Building Resilient Cloud Applications with Azure by Hamida Rebai Trabelsi(10597)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(10029)
Hit Refresh by Satya Nadella(9116)
The Kubernetes Operator Framework Book by Michael Dame(8538)
Exploring Deepfakes by Bryan Lyon and Matt Tora(8365)
Robo-Advisor with Python by Aki Ranin(8305)